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1. INTRODUCTION 

Excavating patterns in professionalrelations, time-series, heritablearchives, and several other classes 
of data is a simplestage in articulatingpropositions and realizinglinks among items. For example, the market 
basket exploration in the retail business is directed at determining which products is to be boughttogether in 
order to capture the buyinghabits of consumers or clients, to recognise their requirements, to improve cross- 
promotional series, to get new consumers or, additional in general, to developcorporateroutines. In order to 
do this, usuallyexperts look at the most persistent patterns, while also observing at the minimumrepeated co- 
occurrences can deliverstimulating insights, regarding lesser but not less motivating groups[1]. 


1.1. Background 

For generating rare itemsets, various researchers have designed the algorithms. Rare item problem is 
one of the significant challenges in association analysis. Hypothetically reviewing the rare item problem in 
association analysis started by Liu, Hsu, and Ma [2]. They anticipated a multiple minimum-support 
methodology in which every item in the data set is having its own minimum item support (MIS). MIS is 
computed by associating a smallest permissible support and the support of the item times a parameter, 8. 
In this means, rare items have smaller minimum-support associated to common items, therefore they will not 
be unnoticed in the rule generation process. There are two key hitches of this methodology. First, stipulating 
MIS when the amount of items in the dataset is huge is a monotonous work, and second, defining the 
optimum value of 8 is not informal. To report these concerns, Szathmary and Valtchev[3]It primarily uses 
Apriori-Rare which finds the Minimal Rare Item sets. ARIMA proceeds these Minimal Rare Items and gives 
the output as rare item sets. The key benefit is that the existing approach is able to catch rare item set 
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deprived of making zero item sets. However, it is based on two threshold standards, i.e. minimum support 
and maximum support. 

A transaction mapping algorithm was designed by Song and Sanguthevar[4]. The main benefit of 
this algorithm is that it can generate rare item sets. Still, this algorithm is reliant on algorithms MRGExp and 
Apriori-Rare. In the paper [5], Arnab das tinted the significance of rare item sets in some areas. 
He anticipated an approach for determining the non-zero rare item set and generated motivating arrangements 
from various item sets. Yun, Ha, Hwang, and Ryu [6] presented relative support. Their procedure does not 
contain the factor 8, therefore they did not have the task of determining the optimum value for 8. 

Geevlin and Mala [7-8] projected an algorithm for mining frequent item set and producing rare 
items from various data bases. In this study, two core glitches with current methods were projected. 
Excavating progression is being prepared with Statistical methods like Poisson distribution. 

Kiran and Re [9] anticipated a better multiple minimum-support method for mining the rare 
association rules. Their method needs specifying multiple minimum support, which is problematic related to 
a single minimum adjusted support. Srikant and Agrawalin 1997 produced a small alteration of classical 
algorithm. In this study, the rare items are moreover composed on the basis of minimum support value. 
Still, it flops to catch all the rare item sets. 

Several investigators have applied association rules for diabetic patients. Piri et al [10] analysed the 
data of 23, 17, 259 patients identified with hypertension and diabetes. By put on association analysis, 
they initiate hypertension and diabetes was toughly connected. Mustafa et al [11] explored various algorithms 
for generating association rules. They analysed the algorithms on benchmark dense datasets and found FP- 
Growth algorithm is better as compared to others. Prasad [12] designed a new algorithm whose aim was to 
eliminate low-profit itemsets. The algorithm uses short time for generating high-profit itemsets. But the 
limitation is that it is unable to generate rare itemsets and may be extended for big datasets. Hussain et al [13] 
used Apriori algorithm for generating association rules and analysed the academic performance of students 
using WEKA. 

To increase the quality of drawing out rare items, a technique named “Multiple Minimum Support 
Model” was developed by Darrab et al [14]. Every item is allocated along with a least support cost called 
“Minimum Item Support” (MIS) in this method. This value is allocated to every item identical to a section of 
its support. The approach, describes the least support of a rare rule in relations of minimum item support of 
the items that look in the rule. i.e. every item in the data base can have a least item support that can be 
considered by means of some method or can be quantified by the user. By giving various MIS values for 
various items, the user positively states different rules. This algorithm used the downward closure property. 
According to this property, every subset of frequent itemset is frequent. If we use this property, various 
interesting items may be ignored or discarded. 

Alternative methodology for producing rare pattern are relative support Apriori algorithm (RSAA) 
proposed by Elahe et al [15]. This algorithm uses three customer stated supports known as First support, 
Second support and Relative support. If support value of any item is superior than or equal to first support 
value, it is called frequent item. If support value of an item is smaller than first support value but larger than 
or equal to second support value, it is called rare item; item sets having rare items have to mollify 2nd 
support and its relative support should fulfil lowest relative support identified by the user. 

Bhatt et al [16] proposed an extension of RP-Tree algorithm called as Maximum Constraint Rare 
Pattern Tree Algorithm. This algorithm takes the transactional data set. With a previous MIS Value of item, 
this technique controls the rare item set from the data set. This tree chooses the transactions of single rare 
item set in it. The methodology finds only rare items and cuts the other item set from the transaction at the 
time of tree construction. This tree is the extension of RP-Tree While mining the rare itemsets, this algorithm 
uses tree generation. As an insertion of a node in the tree generation, the process may be expensive. 


1.2. The Problem 

In wholesale production [17], the market-basket study is wished for determining which things are to 
be bought together so as to keep buying behaviour of consumers. In market-basket study, some collections of 
things, such as toothpaste and toothbrush, happen commonly. When associated to milk & bread, some things 
like a chain & a gold ring are rarely related item sets, but reflected to be a significant relationship. We may 
also discover some infrequent relations that we cannot visualise. The problem of determining rare items has 
just caught the attention of the data mining. 

Single minimum constraint model adopts that all items have analogous occurrence in the data set. In 
several existent applications, we will encounter following glitches [18]: 

a) If the least support is fixed to a upper value, we are unable to catch the rules that contain of rare 
item sets. 
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b) For finding both rare and frequent items, it is necessary to fix little smallest support value but it 
makes a big quantity of common arrangements which are not valued. 

Rare items can be used in several areas like text excavating - indirect relations can used to catch 
substitutes, antonym that is used in different situations. Infrequent patterns can be used to identify errors. 

Infrequent item set has significant practice in [19]: 

(i) Removal of undesirable association rules from rare item sets (ii) numerical disclosure risk 
valuation where rare patterns in unidentified survey data can lead to statistical revelation (ili) scam discovery 
where rare patterns in monetary data may propose uncommon activity related with fake behaviour (iv)Bio- 
informatics where rare patterns in microarray data may propose genomic disarrays. Rare items carry 
extremely stimulating information to several spheres with medication or natural science. 


1.3. Proposed Solution 
In this paper, the author will design a new algorithm/ technique whose purpose is to mine rare 
itemsets from the transactional database. This method will also overcome limitations of existing approach. 
a) Apriori-Rare, Apriori-Inverse is not able to find rare item sets. 
b) ARIMA Algorithm is able to generate rare item set but the rules produced from them are not all 
stimulating. 
c) The process may be expensive as the existing algorithm uses tree generation method. 
d) Various Rare itemsets may be ignored or discarded by using downward closure property. 
The organization of the research paper is as follow: In section 2, proposed algorithm is offered and 
Investigational results are shown in section 3. Conclusion part is given in section 4. 


2. PROPOSED METHOD 

After reviewing and analysing the various algorithms, it is perceived that algorithms mentioned 
above have some drawbacks. The existing algorithmsApriori-Rare, Apriori-Inverse are not able to find all 
rare item sets. While the existing method/ procedure ARIMA is able to generate rare item set. Also, the rules 
produced from them are not all stimulating. To remove the drawbacksmet by the all methods, we suggesta 
new approach which follows a bidirectional approach. The new method uses the dataset & minimum 
threshold as input and yields rare item sets as output. It helps in pruning the candidate. The steps of proposed 
approach are as follow: 


2.1. Steps: 
a) Scan the database only one time to catchreal support of items. 
b) Items in the transaction are in Ascending/ Descending order according to multiple support threshold. 
c) Calculate MIS value for each item in transaction data set. 
MIS = 8 S(ij) if 8 SGj) > LS 
S (ij) else 
Where 8 is a user defined value lies between zero and one;S (ij) denotes thepercentage support of 
any item equal to f (ij) / N*100; and LS is user defined least support value. 
d) Find the least minimum support threshold. 
e) Find rare item set if support is less than LS and larger than or equal to least minimum support 
threshold. 
f) Find the transaction having atleast one rare item set from transaction data set. 


2.2. Flow Chart 
Following is the description provided for Given a transaction database DB as shown in Figure 1. 
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Figure 1. Given a transaction database DB as shown 


Example: Given a transaction database DB as shown in the Table 1 with Minimum Support 
(LS)fixed at 20% and 8=0.7; the multiple item supportsof items in Table 2. 


Table 1. Transactions Containing Various Items 








TID Ttems 
Tl D,C,A,F 
T2 G,C,A,F,E 
T3 B,A,C,F,H 
T4 G,B,F 
TS B,C 





Table 2. Calculation of Support %age 
Items A B C D E F G H 
MIS 42 42 56 20 20 56 28 20 
Support %age 60 60 80 20 20 80 40 20 











As per the proposed approach, the actual supportpercentage and minimum support threshold values 
of various itemscalculated are shown in Table 1. MIS values are calculated according to the formula 
described in step 2 of proposed approach. Notice, if B=1 and Sij => LS, minimum support threshold value of 
any item are the real support of items Sij,whereas if 8=0, there is only single minimum support. It is to be 
noted that 8 parameter is determined by the formula[9]: = 1/c. In Table 2, the results are arranged and 
stored according to Minimum Support Threshold values. 

Least minimum support threshold value is 20. D, E, H are rare item sets as these items are not 
smaller than least support value but larger than or equal to smallest minimum threshold value. So, in the next 
step, we will select the transactions having at least single rare item set. As shown n in Table 4. In Table 3, the 
result are arrangement of items according of items according to minimum support threshold: 


Table 3. Arrangement of Items According of Items According to Minimum Support Threshold 








-- C F A B G D E H 
MIS 56 56 42 42 28 20 20 20 
Support %age 80 80 60 60 40 20 20 20 
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Table 4. Selection of Rare Items from Transaction 
Items Tl T2 T3 T4 T5 
MIS D E H - - 











3. RESULTS AND DISCUSSION 

Here, the proposed method isequated with the existing versions, to discover rare item sets 
undermultiplesupportthresholds. Toverifytheefficacyandproficiencyofthenewmethod, severalexperimentsare 
conducted usingvarious datasetswithdifferentfeatures. In these tests, we measurethe performance with respect 
to time and memory space. 


3.1. Experimental Environment and Datasets 

We conduct the experiments by means of different kind ofdatasets to find the performance and 
efficiencyof the proposed method. The data sets are executed and tested on machine Intel Core 2, 2.00 Ghz 
with 64 bit Operating system and are implemented in Python programming language. 

We used three realworld datasets (Monk1, Chess and Cancer). The real world datasets are taken 
fromFIMI repository [14]. The important characteristicsreal world datasets are given in Table 5-8 


Table 5. Characteristics of Datasets 
Data Set Instances Attributes 








CHESS 3024 37 
MONK1 432 5 
CANCER 569 30 





Table 6. Results Obtained for Chess 








Data Sets Time Execution No of Rare item sets Memory consumed (MiB) 
200 0.85 25 1 
1000 0.9 18 0.8 
1500 0.87 17 0.8 
3000 0.9 7 0.9 





Table 7. Results Obtained for Cancer 
Data Sets | Time Execution No of Rare item sets | Memory consumed (MiB) 
560 0.71 3 1 











Table 8. Results Obtained for Monk1 
Data Sets Time Execution No of Rare item sets Memory consumed (MiB) 
432 0.6 4 1 











The execution time evaluation of proposed and various existing versions are given in Tables 9-10. 
The performance of proposed algorithm with several isdignified on the specified datasets. Note that, 
the execution time means the total runtime, which is the period among input and output. The experimental 
results divulge that proposed method is substantively faster than earlier versions. 


Table 9. Comparison Between Proposed & Existing System (Chess Dataset) 








Time No of Rare item 
Execution(seconds) sets 
MSApriori 34 7 
Existing [Gandhi P] 13 7 
Proposed 0.9 17 





Following is the description provided for Comparison for Chess dataset as shown in Figure 2. 
Following is the description provided for Comparison for Cancer dataset as soon in Figure 3 
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Figure 2. Comparison for chess dataset 


Table 10. Comparison Between Proposed & Existing System (Cancer Dataset) 
Time Execution(seconds) No of Rare item sets 








Apriori 1 0 
Existing [Hoque N] 1 0 
Proposed 0.7 3 





Comparison for Cancer Dataset 


1 
: A ia 
0 

Apriori Existing [Hoque N] Proposed 


@ Time Execution(seconds) @ No of Rare item sets 
Figure 3. Comparison for cancer dataset 
Comparison between proposed & existing system (monk1 dataset) as shown in Table 11. 


Table 11. Comparison between proposed & existing system (monk! dataset) 








Time Execution(seconds) No of Rare item sets 
Apriori 1 0 
Existing [Gandhi P] 1.5 5 
Proposed 0.6 4 





Following is the description provided for Comparison for Monk! datasetas shown in Figure 4. 


Comparison for Monk1 dataset 


. 

3 

2 

Ly 

, x = 


Apriori Existing [Gandhi P] Proposed 


Time Execution(seconds) @ No of Rare item sets 
Figure 4. Comparison for Monk! dataset 
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From the experimental results, it is observed that proposed algorithm gives better result as compared 
to previous algorithms. For Chess dataset, it is found that proposed system gives better result i.e. number of 
rare items are high and time consumption is less. In case of Cancer dataset, previous algorithms gives nothing 
but our method finds rare itemsets. However, for Monk1 dataset, the previous algorithm gives more rare 
itemsets than ours but it needs much amount of time and same number of database scan like Apriori. 


4. CONCLUSION 

As single minimum support is inadequate for association rule mining, it is unable to imitate 
frequency differences of the different items in the database. In realistic applications, such type of differences 
can be very huge. It is neither acceptable to set the minimum support too large, nor it is appropriate to set it 
too small. In this paper, we have explored the problem of using item specific minimum support. It permits the 
customer to stipulate multiple minimum item. To answer thisproblem, we have proposed an algorithm which 
is skilful of drawing out rare patterns efficiently. We have assessed the performance of proposed algorithm 
by showing atest on various datasets. The above mentioned results show that proposed algorithm has come 
out from the rare itemproblem and gives user more flexible and dominant model to state minimum support 
for rare item. Thus, proposed algorithm allows us to colliery rare pattern without creating any unexcitingand 
tedious pattern. 
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